Below I do some exploratory data analysis with multichannel tiff files from a Vectra instrument. We are looking at a tissue slice from a patient with lung cancer. Paper reference here.
I do some light image manipulation as well.
library(tidyverse)
library(here)
#library(rgdal)
#library(tiff)
library(phenoptr)
knitr::opts_chunk$set(echo = TRUE,
warning = FALSE,
message = FALSE,
fig.width = 8,
fig.height = 6
)
theme_set(theme_bw() + theme(legend.position = "bottom"))
There are roughly 132 subjects and each subjects have 3-5 images. Only one image (there are multiple tiff files for the one image) from one subject is listed here.
The are collected on a Vectra instrument. Cell and tissue segmentation and cell phenotyping were performed using Inform software. For each image, there are multiple Tiff files:
*component_data.tif: the multichannel file we are most interested in*binary_sig*: 3 channel file with binary segmentation maps"subject_id".tif: A composite image in RGB format. Makes pretty pictures.## define your specific file path
my_path = here("data")
# load csv filepaths
filenames = list.files(my_path)
component = paste0(my_path, "/", "#16 3-171-807_[62243,18509]_component_data.tif")
binseg = paste0(my_path, "/", "#16 3-171-807_[62243,18509]_binary_seg_maps.tif")
composite_tif = paste0(my_path, "/", "#16 3-171-807_[62243,18509].tif")
The subject and image ID are embedded in the file name. For example, for the file “#16 3-171-807_[62243,18509].tif,”#16 3-171-807" is the subject id, and “62243,18509” is the image ID.
The phenoptr package has functions for reading in data from Vectra.For the component files phenoptr::read_components() is a wrapper around tiff::readTIFF() which reads the individual component images from a component_data.tif file. It keeps the full-resolution component images, extracts the component names from the image descriptions, and returns a named list of image matrices.
Code below loads one multichannel tiff file. (I think) it returns a list of length 8 because there are 8 channels in this image. Names and dimensions of each channel are output below.
image_stack = phenoptr::read_components(component)
map(image_stack, dim)
## $`CD19 (Opal 650)`
## [1] 1008 1348
##
## $`CD3 (Opal 520)`
## [1] 1008 1348
##
## $`CD14 (Opal 540)`
## [1] 1008 1348
##
## $`CD8 (Opal 620)`
## [1] 1008 1348
##
## $`HLADR (Opal 690)`
## [1] 1008 1348
##
## $`CK (Opal 570)`
## [1] 1008 1348
##
## $`DAPI (DAPI)`
## [1] 1008 1348
##
## $Autofluorescence
## [1] 1008 1348
ls(image_stack)
## [1] "Autofluorescence" "CD14 (Opal 540)" "CD19 (Opal 650)" "CD3 (Opal 520)"
## [5] "CD8 (Opal 620)" "CK (Opal 570)" "DAPI (DAPI)" "HLADR (Opal 690)"
Pull just the DAPI (nuclear) channel. This itself should be an image matrix.
dapi_channel = image_stack$"DAPI (DAPI)"
dim(dapi_channel)
## [1] 1008 1348
Component data may be displayed using plot.raster orEBImage::display`:
EBImage::display(t(dapi_channel))
The segmentation files are binary maps containing information about segmentation and either the tissue or cell level. The phenoptr::read_maps() function is a wrapper around readTIFF() which reads segmentation files. It returns a named list of integer-valued matrices. The list names reflect the content of the individual images, e.g. Nucleus, Cytoplasm, etc.
segmentations = phenoptr::read_maps(binseg)
names(segmentations)
## [1] "Nucleus" "Membrane" "Tissue"
In this case, we have segmentation maps of the nucleus, membrane, and tissue.
nucleus = segmentations[['Nucleus']]
dim(nucleus)
## [1] 1008 1348
Below is a histogram of nucleus values.
hist(nucleus)
table(nucleus[nucleus < 3])
##
## 0 1 2
## 1232040 104 113
Nuclei are plotted below:
plot(as.raster(nucleus, max = max(nucleus)))
Cytoplasm map is plotted below
membrane = segmentations[['Membrane']]
plot(as.raster(membrane, max = max(membrane)))
Tissue map is plotted below
tissue = segmentations[['Tissue']]
plot(as.raster(tissue, max = max(tissue)))
I believe 0 is stroma and 1 is tumor but it would be good to know this for sure. It would also be nice to overlay these images somehow.
hist(tissue)
What I’d like to do next is apply both the nuclear and tissue masks to the dapi channel, so that areas that are not healthy (non-tumor) nucleus retain their original value and the rest is black
dim(dapi_channel)
dim(nucleus)
head(dapi_channel[,1:10])
nucleus[nucleus>0] = 1
#table(nucleus)
#table(tissue)
tissue = 1- tissue
#table(tissue)
dapi_map = dapi_channel * nucleus * tissue
#dim(dapi_map)
plot(as.raster(tissue, max = max(tissue)))
plot(as.raster(dapi_map, max = max(dapi_map)))
What is the distribution of the nonzero dapi_map values?
hist(dapi_map[dapi_map > 0])